131
from the observed sequences how probable which mutation is at which position, i.e. com
pile a very large table (more precisely: a matrix of transition probabilities) and then calcu
late the phylogenetic tree. This third method is particularly computationally intensive and
time-consuming, but of course particularly accurate.
In practice, it is still important that the faster methods are also more easily off the mark
when things get complicated. Depending on the calculation rule used, the result is more or
less easily falsified. This happens especially when sequences of different lengths are com
pared or when a single sequence is quite long (“long branch attraction”). The infobox
summarizes a number of tools.
Thus, bioinformatics enables us to describe evolution more precisely and to understand
important aspects of it by analysing many such phylogenetic trees, but also genomes and,
in particular, by taking a detailed look at individual gene families. In particular, by analys
ing the amino acid sequences involved, but also the available structural data of important
enzymes, it is possible to describe and analyse exactly how they function, which amino
acid residues are important for the chemical reaction they catalyse and which functional
subunits they consist of. These subunits are also known as protein domains. They are typi
cally 100–150 amino acids long, fold stably (hence their size – if they were longer they
would fold into multiple sections, if they were smaller they would not fold at all) and each
has a specific function. For example, there are catalytic domains, regulatory domains,
interaction domains, those that bind cofactors (often vitamins), and those that allow for a
solid structure in the protein (e.g. fibrils or fibers). Looking at protein families can shed
light on how a protein function changes or adapts across different organisms and how, for
example, additional mutations can turn a catalytic domain into a regulatory domain.
Phylogenetics Tools
Phylogeny
Family trees resemble real trees if there is a clear root (origin), for example by
including a distant species (“outgroup”).
Basically, there are three ways to calculate family trees:
• Always merge and calculate direct neighbours: neighbor joining. This can be
done quickly and is implemented excellently and efficiently in the CLUSTALW
software, for example.
• Parsimony tries to calculate the family tree with as few mutations as possible.
This is already more computationally expensive.
• Maximum likelihood considers the most computationally expensive procedure.
Each nucleotide exchange is considered according to its (often estimated) prob
ability and then the most probable phylogenetic tree is calculated.
10.4 Describing Evolution: Phylogenetic Trees